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Abstract. Discrete ancestral problems arising in population genetics are in- 
vestigated. In the neutral case, the duality concept has proved of particular 
interest in the understanding of backward in time ancestral process from the 
forward in time branching population dynamics. We show that duality formu- 
lae still are of great use when considering discrete non-neutral Wright-Fisher 
models. This concerns a large class of non- neutral models with completely 
monotone (CM) bias probabilities. We show that most classical bias probabil- 
ities used in the genetics literature fall within this CM class or are amenable to 
it through some 'reciprocal mechanism' which we define. Next, using elemen- 
tary algebra on CM functions, some suggested novel evolutionary mechanisms 
of potential interest are introduced and discussed, 
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1. Introduction 

Forward evolution of large populations in genetics has a long history, starting in 
the 1920s; it is closely attached to the names of R. A. Fisher and S. Wright; see T. 
Nagylaki ('1999) for historical commentaries and on the role played by the French 
geneticist G. Malecot, starting shortly before the second world war. The book of 
W. Ewens ('2004) is an excellent modern presentation of the current mathemat- 
ical theory. Coalescent theory is the corresponding backward problem, obtained 
while running the forward evolution processes backward-in-time. It was discovered 
independently by several researchers in the 1980s, but definitive formalization is 
commonly attributed to J. Kingman ('1982). Major contributions to the develop- 
ment of coalescent theory were made (among others) by P. Donnelly, R. Griffiths, 
R. Hudson, F. Tajima and S. Tavare (see the course of Tavare in Saint-Flour '2004 
for a review). It included incorporating variations in population size, mutation, 
recombination, selection... In ('1999), J. Pitman and S. Sagitov, independently, in- 
troduced coalescent processes with multiple collisions of ancestral lineages. Shortly 
later, the full class of exchangeable coalescent processes with simultaneous multiple 
mergers of ancestral lineages was discovered by M. Mohle and S. Sagitov ('2001) 
and J. Schweinsberg ('2000). All these recent developments and improvements con- 
cern chiefly the discrete neutral case and their various scaling limits in continuous 
time and/or space. As was shown by Mohle ('1994 and '1999), neutral forward 
and backward theories learn much from one another by using a concept of duality 
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introduced by T. Liggett ('1985). Backward theory in the presence of mutations 
in the forward process is well-understood, as it requires the study of a marked 
Kingman's tree (see Tavare, ('2004) for a review). In the works of C. Neuhauser 
and S. Krone ('1997), there is also some use of the duality concept in an attempt 
to understand the genealogies of a Wright-Fisher diffusion (as a limit of a discrete 
Wright-Fisher model) presenting a selection mechanism; this led these authors to 
the idea of the ancestral selection graph extending Kingman's coalescent tree of the 
neutral theory; see also T. Huillct ('2007) for related objectives in the context of 
Wright-Fisher diffusions with and without drifts. There is therefore some evidence 
that the concept of duality could help one understand the backward theory even 
in non-neutral situations when various evolutionary forces are the causes of devi- 
ation to neutrality (see J. Crow and M. Kimura, (T970), T. Maruyama ('1977), 
J. Gillepsic (T991) and W. Ewens ('2004), for a discussion on various models of 
utmost interest in population genetics). 

In this Note, we focus on discrete non-neutral Wright-Fisher (say WF) models and 
on the conditions on the bias probabilities under which forward branching popu- 
lation dynamics is directly amenable to a dual discrete ancestral coalescent. We 
emphasize that duality formulae still are of great use when considering discrete non- 
neutral Wright-Fisher models, at least for specific deviation forces to neutrality. It 
is shown that it concerns a large class of non-neutral models involving completely 
monotone bias probabilities. Several classical examples are supplied in the light of 
complete monotonicity. In the process leading us to focus on these peculiar bias 
models, some unsuspected evolutionary mechanisms of potential interest are intro- 
duced and discussed, as suggested by elementary algebra on completely monotone 
functions. We emphasize that the relevance of these novel bias mechanisms in Bi- 
ology seems to deserve additional work and confrontation with real- world problems 
is urged for to pinpoint their biological significance. 

We shall finally briefly outline the content of this manuscript. Section 2 is designed 
to fix the background and ideas: We introduce some basic facts about the discrete- 
time forward (subsection 2.2) and backward processes (subsection 2.3) arising from 
exchangeable reproduction laws (subsection 2.1). In subsection 2.4, we introduce a 
concept of duality and briefly recall its relevance to the study of the neutral case 
problem. The basic question we address in subsequent sections is whether this 
notion of duality still makes sense in non-neutral situations. We start supplying 
important non-neutral examples in section 3. In section 4, we show that duality 
does indeed make sense in the framework of discrete non-neutral Wright-Fisher 
models, but only for the class of completely-monotone state-dependent transition 
frequencies. In section 5, we show that most non-neutrality mechanisms used in the 
literature fall within this class, or are amenable to it via some 'reciprocal transfor- 
mation', starting with elementary mechanisms and ending up with more complex 
ones. In section 6, we show that duality can be used in non-neutral situations 
to compute the extinction probabilities (invariant measure) of the dual backward 
ancestral process if one knows the invariant measure (respectively, extinction prob- 
abilities) of the forward branching process. 



DUALITY AND DISCRETE NON-NEUTRAL WRIGHT-FISHER MODELS 



3 



2. Discrete-time neutral coalescent 



In this Section, to fix the background and notations, we review some well-known 
facts from the cited literature. 

2.1. Exchangeable neutral population models: Reproduction laws. (The 
Cannings model: '1974). Consider a population with non-overlapping generations 
r G Z. Assume the population size is constant, say n (n individuals (or genes)) 
over generations. Assume the random reproduction law at generation is v n := 
{vi,n,-,Vn,n) , satisfying: 

n 
m—1 

Here, v m ^ n is the number of offspring of gene m. We avoid the trivial case: v m ,n = 1, 
m = l,...,n. One iterates the reproduction over generations, while imposing the 
following additional assumptions: 

- Exchangeability: {vi, n , v n , n ) = (^(i),™, v a{n),n) , f° r & U permutations a G 

- time-homogeneity: reproduction laws are independent and identically distributed 
(iid) at each generation r G Z. 

This model therefore consists of a conservative conditioned branching Galton- 
Watson process in [n] Z , where [n] := {0, 1, ...,n} (see Karlin-McGregor, '1964). 



Famous reproduction laws are: 

Example 2.1.1 The multinomial Dirichlct model: v> n ~ Multin-Dirichlet(n; 6), 
where 9 > is a disorder parameter. With k„ := (fci, k n ), u n admits the 
following joint exchangeable distribution on the simplex |k„| := X^m=i ^ m = n: 

raw ii n! TT t^fcm 

where [0] fc = (0 + 1) ... (0 + k — 1) is the rising factorial of 9. This distribution 
can be obtained by conditioning n independent mean 1 Polya distributed random 

variables = ...,£„) on summing to n, that is to say: v n = (£„ | |£„| = n) , 
where 

F e (^ = fc) = ^ (l + 0)- fe (0/ (1 + 9)) e , k e N. 

When | oo, this distribution reduces to the Wright-Fisher model for which i/„ <~ 
Multin(n; 1/n, 1/n) . Indeed, i>» n admits the following joint exchangeable multi- 
nomial distribution on the simplex |k n | = n: 

n! • nT n 

llm=l K rn- 

This distribution can be obtained by conditioning n independent mean 1 Poisson 
distributed random variables £„ = •••,£„) on summing to n: v n = (£ n \ \£ n \ = n). 
When n is large, using Stirling formula, n\ <~ \/2nn n+1 / 2 e~ n ; it follows that 
v n ^oc with joint finite-dimensional law: P(£„ = k„) = Ilm=i f4 = n" 6 i 

n"|"oc m ' Um=i fbm ' 
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on N™. Thanks to the product form of all finite-dimensional laws of £ x , we get an 
asymptotic independence property of u n . 

Example 2.1.2 In the Moran model, v n ~ random permutation of (2, 0, 1, 1) : in 
such a model, only one new gene per generation may come to life, at the expense 
of the simultaneous disappearance of some other gene. 

2.2. Forward in time branching process. Take a sub-sample of size m from 
[n] := {0, 1, n} at generation 0. Let 



N r (to) = # offspring at generation r £ N+, forward- in-time. 

This sibship process is a discrete-time homogeneous Markov chain, with transition 
probability: 

(1) P {N r+1 (to) = k'\N r (to) = k) = P {u hn + ... + v k ,„ = k') . 

It is a martingale, with state-space {0, n}, initial state m, absorbing states {0, n} 
and transient states {1, n — 1} . The first hitting time of boundaries {0, n}, which 
is: t (to) = T{ } (m) A T{ n } (w) is finite with probability 1 and has finite mean. 
Omitting reference to any specific initial condition to, the process (N r ;r S N) has 
the transition matrix II„ with entries II„ (k,k') = P (i^i in + ... + Vk,n = k') given 
by dl]). We have U n (0, k') = <5 ,fc' and II„ (n, k') — 8 n ^> and II„ is not irreducible. 
However, II n is aperiodic and (apart from absorbing states) cannot be broken down 
into non-communicating subsets; as a result it is diagonalizable, with eigenvalues 
|A | > |Ai| > ... > |A„| and 1 = A = A x > |A 2 |. 



Example 2.2.1 (Dirichlct binomial): With U a (0, 1) — valued random variable with 
density beta(fc6', (n — k) 8) 

p K» + + v k , n = k )={ k/ ) ^ ~ i v^v ( } 

which is a beta mixture of the binomial distribution Bin(?i, u) . 

Example 2.2.2 The Wright-Fisher model has a Bin(n, k/n) transition matrix: 

P(AT r+1 (to) = k' | N r (to) = k) = (?) (l-^-Y " . 

Remark (statistical symmetry): Due to exchangeability of the reproduction law, 
neutral models are symmetric in the following sense: The transition probabilities 
of N r (m) := n — N r (to) are equal to the transition probabilities of N r (to). □ 

2.3. Backward in time process, (neutral coalescent) 

The coalescent backward process can be defined as follows: Take a sub-sample of 
size m from [n] at generation 0. Identify two individuals from [to] at each step if 
they share a common ancestor one generation backward-in-time. This defines an 
equivalence relation between 2 genes from the set [to] . Define the induced ancestral 
backward process: 

A r (to) € £ m = {equivalence classes (partitions) of [to]} , r € N, 
backward-in-timc. 
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The ancestral process is a discrete-time Markov chain with transition probability: 

P (A+i (m) = a | A r (to) = (3) = Pp- a : with (a, (i) e £ m , a C /3 

where, with a = |a| = number of equivalence classes of a, b = |/3| = num- 
ber of equivalence classes of (3, b a := (6i,...,6 a ) clusters sizes of /? and (m) a := 
m (ra — 1) ... (m — o + 1) a falling factorial, 

Pfta = < (ba) = ^(^j 

is the probability of a b a — merger. This is the probability that b randomly chosen 
individuals out of n have a < b distinct parents, c merging classes and cluster sizes 
bi > ... > b c > 2, b c+1 = ... = b a = l. 

If c = 1: a unique multiple collision occurs of order b\ > 2. 

If 6i = 2: a simple binary collision occurs involving only two clusters. 

If c > 1, simultaneous multiple collisions of orders b\ > ... > b c > 2 occur. 

Thus, the jump's height of a transition b — ► a is b—a = Yli=i (^» — -0 i corresponding 
to a partition of integer b — a into c summands, each > 1. 

The chain's state-space is: {equivalence relations on (partitions of) {1, ...,m}}; it 
has dimension B m := Y^iLo ^m,l ( a Bell number), where S m .i are the second-kind 
Stirling numbers. 

The chain has initial state Aq = {(1) , (m)}, and terminal absorbing state 
{(l,...,m)}. 

Examples: 

From the Dirichlct example 2.2.1, we get: P$ (b„) = IIJUi [0] 6m - 

From the WF example 2.2.2: In this case, Pf; n J (b ) = ^f - is the uniform distribu- 
tion on {b a : bi + ... + b a = b}. 

The ancestral Count Process: Let 

A r (m) = # ancestors at generation r e N, backward-in-time. 

Then: A r (m) = # blocks of A r (m) . 

The backward ancestral count process is a discrete-time Markov chain with transi- 
tion probabilities (Cannings, '1974 and Gladstien T978): 

(2) P (A r+1 (m) = a \ A r (m) = b) = P% := £ -f^>. 

oi....,6 a GN + 

b 1 +..-+6 a = b 




This Markov chain has state-space {0, ...,m}, initial state m, absorbing states 
{0, 1} . The process (A r ; reN) has the transition matrix P n with entries P n (b, a) = 
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Pj"' given by |(5J). Note, by inclusion-exclusion principle, the alternative alternating 
expression of P^ : 

2.4. Duality (neutral case). We start with a definition of the duality concept 
which is relevant to our purposes. 

Definition [Liggett, '1985]: Two Markov processes (X^, Xf; t > 0) , with state- 
spaces (£1,82) , are said to be dual with respect to some real-valued junction $ on 
the product space £\ x £2 if Vxi € £\, Vx2 € £2, V£ > : 

(3) E^X^n*) =E X2 $(x!,X t 2 ). 



We then recall basic examples of dual processes from the neutral and exchangeable 
population models (Mohle, '1997): The neutral forward and backward processes 
(N r ,A r ;r £ N) introduced in the two preceding subsections are dual with respect 
to the hyper geometric sampling without replacement kernels: 



(4) 



(i ) $i(m,*) 



(it) (m,A:) = 



n — m 
k 



Namely (i) reads: 



E t 



E t 



and 



{0,...,7!} 2 



71 — A r 

n — m 



n 

n — m 



Call type A individuals the descendants of the m first chosen individuals (allele A) 
in the study of the forward process; type a individuals are the remaining ones (allele 
a). The left-hand-side of the above equality is an expression of the probability that 
a fc— sample (without replacement) from population of size N r at time r are all of 
type A, given iVo = m. If this fc— sample are all descendants of A r ancestors at 
time — r, this probability must be equal to the probability that a (77 — m) —sample 
from population of size A r at time — r are all of type a. This is the meaning of the 
right-hand-side . 

And (u) reads: 

'n - N r 



E„ 



n — ?7i 
. 1 



= E fc 



A r 



The left-hand-side is the probability that a fc— sample (without replacement) from 
population of size N r at time r are all of type a, given iVo = m. If this fc— sample are 
all descendants of A r ancestors at time — r, this probability must be equal to the 
probability that a 771— sample from population of size A r at time — r are themselves 
all of type a. 



With P' n the transpose of P n , a one-step (r = 1) version of these formulae is: 
(i) II„<^ = & n P' n and (77) n„$2 = <j>2p/ 
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where ($* , are nxn matrices with entries (m, fc) and $^ (m, fc) , respectively 
and (Il n , P n ) the transition matrices of forward and backward processes. Note that 
the matrix <& n is symmetric and left-upper triangular. The matrices and $ 2 are 
both invertible, with respective entries 

[*y- 1 («,i)=(-i) w (j)(?) 

and 

wr 1 («)=(-!)«- (;)=(-i)«- („'_,)(;). 

The matrix [& n ] 1 is left-lower triangular, while [$^] 1 is symmetric right-lower 
triangular. Thus, 

(i) -1 n„$i = P^ and (it) [$ 2 J ~ x n„$2 = P ' n . 

In any case, being similar matrices, II n and P^ (or P„) both share the same eigen- 
values. If R n diagonalizing II „ is known so that P~ 1 n„P„ = A„ := diag(Ao, X n ) , 
the diagonal matrix of the eigenvalues of H n , then, with «3> n = or Q n , R n := 
$~ 1 P n diagonalizes P^ and is obtained for free (and conversely). R n is the matrix 
whose columns are the right-eigenvectors of IT„ and R n is the matrix whose columns 
(rows) are the right-eigenvectors (left-eigenvectors) of P' n (of P n ). Similarly, if L n 
is the matrix whose rows are the left-eigenvectors of II n , L n :— L„$ n is the ma- 
trix whose rows (columns) are the left-eigenvectors (right-eigenvectors) of P' n (of 
P„). With l' k the k— th row of L n and the fc— th column of R n , the spectral 
decomposition of II n is: 

whereas, with ij. the fc— th column of L n and r^. the fc— th row of R ni the one of P n 
reads: 

fc= o r fe ? fc fc=o (*n r fc ) 



In Mohle T999, a direct combinatorial proof of the duality result can be found (in 
the general exchangeable or neutral case); it was obtained by directly checking the 
consistency of ©, © and ©■ 

The duality formulae allow one to deduce the probabilistic structure of one process 
from the one of the other. The question we address now is: does duality still 
make sense in non-neutral situations? We shall see that it does in discrete non- 
neutral Wright-Fisher models, but only for some class of state-dependent transition 
frequencies. 

3. Beyond neutrality (symmetry breaking) 

Discrete forward non-neutral models (with non-null drifts) can be obtained by sub- 
stituting 

fc — > np ( — ] in P (i/i iTl + ... + v k>n = fc') , 
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where: 

p(x) : x e (0, 1) — ► (0, 1) is continuous, increasing, with p (0) = 0, p (1) = 1. 

p (x) is the state-dependent Bernoulli bias probability different from identity x (as 
in neutral case). 

When particularized to the WF model, this leads to the biased transition probabil- 
ities: 

F(N r+1 (m) = k' \ N r (m) = k) = ^p(^j (l-p(^" " 

In this binomial n— sampling with replacement model, a type A individual is drawn 
with probability p (^) which is different from the uniform distribution k/n, due to 
bias effects. 

From this, we conclude (a symmetry breaking property): The transition probabili- 
ties of N r (to) := n — N r (to), r G N are 

Bin (n, 1 — p (1 — k/n)) ^ Bin (n,p(k/n)) , 

and so, no longer coincide with the ones of (N r (to) ;r € N) . The process N r (m), 
r E N no longer is a martingale. Rather, if x —> p (x) is concave (convex), N r (to), 
r 6 N is a submartingalc (supcrmartingale), because: E(N r+ i(m) | N r (m)) = 
np (N r (to) jn) > A^ r (to) (respectively < N r (to)). 

In the binomial neutral Wright-Fisher transition probabilities, we replaced the suc- 
cess probability ^ by a more general function p(^)- However, this replacement 
leaves open the question what model is in the background and what quantity the 
process (N r ,r G N) really counts. A concrete model in terms of offspring vari- 
ables must be provided instead. To address this question, we emphasize that 
the reproduction law corresponding to the biased binomial model is multinomial 

and asymmetric, namely: u n <~ Multin(n; 7r„), where ir n := (iri !n , Tr n>n ) and: 
7r m n — p (—) — p f 22 ^-), to = 1, ...,n. We note that under our hypothesis, 



m—1 



TT m , n =P(1) ~P(0) = 1. 



Due to its asymmetry, the law of the biased u n no longer is exchangeable. 
We now recall some well-known bias examples arising in population genetics. 

Example 3.1 (homographic model, selection). Assume 
(5) p(x) = (1 + s)x/ (1 + sx) , 

where s > — 1 is a selection parameter. This model arises when gene A (respectively 
a), with frequency x (respectively 1 — x), has fitness 1 + s (respectively 1). The 
case s > arises when gene of type A is selectively advantageous, whereas it is 
disadvantageous when s € (—1, 0) . 

Example 3.2 (selection with dominance). Assume 

(1 + s) x 2 + (1 + sh) x(l-x) 



(6) p(x) 



1 + sx 2 + 2shx (1 - x) 
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In this model, genotype AA (respectively Aa and aa), with frequency x 2 (respec- 
tively 2x (1 — x) and (1 — x) 2 ) has fitness 1 + 3 (respectively 1 + sh and 1). h is 
a measure of the degree of dominance of heterozygote Aa. We impose s > — 1 
and sh > — 1. Note that the latter quantity can be put into the canonical form of 
deviation to neutrality: 

, , , h-x(2h-l) 

p (x) = x + sx (1 — x) 



1 + sx 2 + 2shx (1 - x) 

where the ratio appearing in the right-hand-side is the ratio of the difference of 
marginal fitnesses of A and a to their mean fitness. The case h = 1/2 corresponds 
to balancing selection with: p (x) = x + § ^-pp^r • 

Example 3.3 (quadratic model) With c e [— 1, 1] , a curvature parameter, one may 
choose: 

(7) p(x) = x (1 + c — cx) . 

If c = 1, p (x) = x (2 — x) = 1 — (1 — x) 2 : this bias appears in a discrete 2-sex 
population model (Mohle, T994, T998)). We shall give below an interpretation of 
this quadratic model when c € (0, 1] in terms of a joint one-way mutations and 
neutrality effects model. 

We can relax the assumption p (0) = 0, p (1) = 1 by assuming < p (0) < p (1) < 1, 

p(i)-p(0)e[0,i). 

Example 3.4 (affine model) Take for example 

(8) p (x) = (1 - fj, 2 ) x + m (1 - x) , 

where (/U l5 /i 2 ) are mutation probabilities, satisfying \i x < 1 — fi 2 - It corresponds 

Mi 

to the mutation scheme: a A. To avoid discussions of intermediate cases, 

we will assume that p(0) = Hi > and p(l) < 1 (/i 2 > 0). In this case, the 
matrix II„ is irreducible and even primitive and all states of this Markov chain 
are now recurrent. We have P(AV+i > | N r = 0) = 1 - (1 -p(0)) n > and 
P (N r+ i <n\N r = n) = l— p (1)™ > and the boundaries {0} and {n} no longer 
are strictly absorbing as there is a positive reflection probability inside the domain 
{0,l,...,n}. 



For reasons to appear now, we shall be only interested in functions q such that 
q (x) := 1—p (x) is a completely monotone function (CM) on (0, 1) that is, satisfying: 

(-1)V° (x) > 0, for all x e (0,1), 

for all order-/ derivatives of q, I > 0. If p (x) is such that q is CM, we shall call 
it an admissible bias mechanism. 



4. NON-NEUTRAL WRIGHT-FlSHER MODELS AND DUALITY 

Preliminaries: Let v„ := (v (0) , v (1) , v (n)) be a (n + 1) —vector of [0, 1] —valued 
numbers. Define the backward difference operator V acting on v n by: Vw (m) = 
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v (m) — v (m — 1), m = 1, n. We have V°w (to) = v (to) , V 2 w (to) = v (m) — 
2v (to — 1) + v (m — 2) , etc., and, starting from the endpoint v (n) 

W j v(m) \ m=n =J2(-iy- 1 fyv(n-l),j = 0,...,n. 

Let u be some continuous function: [0, 1] — > [0, 1] . Consider the (n + 1) —vector 
u„ := (u (-) , (— ) , (-)), sampling u at points m/n. The function u is 
said to be V— completely monotonic if (— 1) J V 3 u (^) | m =n> 0, for all j = 0, n 
and all n > 0. Let (m 1 ,?/ 2 ) be two continuous functions on [0, 1]. Let u = u 1 ■ u 2 . 
With u n the point- wise product of u* and u 2 , assuming both functions (u 1 , u 2 ) to 
be V— completely monotonic, so will be u, by the Leibniz rule. In particular, if u is 
V— completely monotonic, so will be its integral powers u\ ieM. Our main result 
is: 

Theorem: Consider a non-neutral WF forward model (N r ;reN) on {0, ...,n}, 
with continuous, non- decreasing bias p{x) , satisfying: 

o<p(o)<p(i)<i,p(i)-p(o)e[o,i]. 

This process has forward transition matrix: 

n n (fc,fc')=PKn + ... + ^n = fc')=(^)p(^) (l-p{^j ■ 

There exists a Markov chain (A r ; r e N) on {0, n} such that (N r , A r ; r e N) are 
dual with respect to $ 2 (to, k) = ( n ~ k m ) / (£) */ flic! on/?/ if: x ^> q (x) ~ 1 — p (x) 
is completely monotone on (0, 1). In this case, the transition probability matrix of 
(A r ;r e N) is: 

m«>-C)£<-^(?).H)'^ 

P„ is a stochastic matrix if and only if p (0) = 0; else, if p (0) > 0, the matrix P n 
is sub- stochastic. 



Proof: Developing [$ 2 ] 1 II„$ 2 = P^, we easily obtain: 



Pn(i,j) 



£(-*) 



3-1 (J 



l-p 



This entry is non-negative if and only if (—1)"' V J ' (^)*^ |m=n> 0, for all i, j = 
0, ...,n. But, due to the above argument on V— complete monotonicity of integral 
powers, this will be the case if and only if (— 1) J V J (q (^)) | m =n> 0, for all j = 
0, ...,n. As this must be true for arbitrary value of population size n, function q 
has to be V— completely monotonic. Adapting now the arguments of Theorem 2 
developed in Feller '1971, page 223, for absolutely monotone functions on (0,1), 
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this will be the case if and only if x — > q (x) := 1 — p (x) is a completely monotone 
function on (0,1) in the sense that: 

(-1)' q {l) (x) > 0, for all x G (0, 1) , I G N. 
Next, since (I — V) u (m) = u (m — 1) is a simple back-shift, 

E p » & o = £ F « = ( J - y )" ( « © i™=»= « (or 

and, if q is CM, P n is a stochastic matrix if and only if g (0) = 1; else, if q (0) < 1, 
the matrix P„ is sub-stochastic. 

We note that the first column of the matrix P n is P n (i,0) — q (1)* whereas its first 
line is: P n (0,j) — 8oj, expressing, as required, that the state of (A r ;r G N) is 
absorbing. □ 

5. Examples 

We show here that most of the simplest non-neutrality mechanisms used in the 
literature fall within the class which we would like to draw the attention on, or are 
amenable to it via some 'reciprocal transformation' which we define. Elementary 
algebraic manipulations on CM functions allows to exhibit a vast class of unsus- 
pected mechanisms. Note that in some cases, their biological relevance remains to 
be elucidated. The results presented in this Section seem to be new. They serve as 
an illustration of our theorem. 

5.1. Elementary mechanisms. Assume first p (x) = x corresponding to the sim- 
ple neutral case. Then q(x) = 1 — x is completely monotone on (0, 1). With Si.j the 
second kind Stirling numbers, we get a lower left triangular stochastic transition 
matrix 

Pniu) = QE^r'Q^^K-^-^'^^ 

Pn(i,j) = 0, else. 

The diagonal terms (eigenvalues) are all distinct with P n = (n). ■ n~ % . The 
matrix P n is stochastic. Due to triangularity, ancestral process is a pure death 
Markov process which may be viewed as a discrete coalescence tree. 

From example 3.4 (mutation). Assume @ holds: p(x) = (1 — jti 2 ) x + fx-y (1 — x) 
where (aH,/^) are mutation probabilities. Then, with k := 1 — (/ij +^2)1 Q ( x ) = 
1 — fly— kx is completely monotone on (0, 1) if and only fiy < 1 — fj, 2 (n > 0). In 
this case, P n is again lower left triangular (a pure death process). We have 

(9) Pn = Q E Q + =: H ' ' 5 G («/») ' 3< * 

P n = 0, else, 
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in terms of generalized Stirling numbers (n/n). We have P n = (n) i {^) % 
and the spectrum of P n is real. When > 0, this matrix is sub-stochastic with 

E"=o P « = (1- Mi)*- 

A particular case deals with one-way mutations (fi 1 + [i 2 > 0, fi 1 ■ fi 2 = 0): 

If n 2 = 0' Pn(i>j) = (1 — Mi)* ' ( rl )j ' ' j < i> = 0, else. Further, 

e; =0 ^(m) = (i-mi) 1 <i. 

If ^ x = 0, P n = (n)j ■ n~ l ■ ((1 — n 2 ) / n )> J — h = 0, else. The correspond- 
ing matrix P n is stochastic. 

From example 3.3 (quadratic). Assume p (x) = x(l + c— cx), as in ([7]). Then 
q (x) = (1 — x) (1 — cx) which is completely monotone if and only if c G [0, 1]. The 
case c = is the neutral case, whereas c = 1 appears in a 2-sex model of Mohle. 
In this quadratic case, since V J (^)*^ = if j > 2i, then P n — if j > 2i 

and so P n is a Hessenberg-like matrix. Note that X)"_ (*>i) = Q (0)* = 1- 

From the selection example 3.1, when ([5]) holds 

p(x) = (1 + s) x/(l + sx), 

q (x) — 1 — p (x) = (1 — x) j (1 + sx) is CM whenever selection parameter s > 0. 
The induced matrix P„ is stochastic. It is no longer lower left triangular so that 
the ancestral no longer is a pure death process, rather a birth and death process. 
The induced coalescence pattern no longer is a discrete tree, but rather a graph (a 
discrete version of the ancestral selection graph of Neuhauser-Krone T997). 

From example 3.2 (selection with dominance). The corresponding mechanism ^ 
with parameters (s, h) satisfying s > —1 and sh > —1 is CM if and only if s > 
and h £ (0,1/2). The case h £ (0,1) corresponds to directional selection where 
genotype AA has highest fitness compared to aa's and the heterozygote class Aa 
has intermediate fitness compared to both homozygote classes. In this situation, 
marginal fitness of A exceeds the one of a and selective sweep is expected. When 
h € (0, 1/2), allele A is dominant to a, whereas when h £ (1/2, 1), allele A is reces- 
sive to a (a stabilizing effect slowing down the sweep). Critical value h — 1/2 is a 
case of pure genie balancing selection. 

Example 5.1.1 Consider the mechanism p(x) = x 1 for some 7 > 0. The function 
q (x) — 1 — p (x) is CM if and only if 7 £ (0, 1) . Although this model seems quite 
appealing, we could find no reference to it in the specialized mathematical genetics' 
literature. 

Example 5.1.2 (Reciprocal mechanism) If p (x) is not admissible in that q is not CM, 
it can be that p (x) := 1— p(l — x) is itself admissible. As observed before, i{N r (m) 
has transition probabilities given by Bin(n,p (k/n)) , p{x) arises in the transition 
probabilities of N r (m) :=n — N r (m) . Indeed, such transitions are Bin(n,J> (k/n)) 
distributed. 
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If p{x) is the selection mechanism of example 3.1, ([5]), with s € (—1,0) (not ad- 
missible), p(x) = (1 + s) x/ (1 + sx) is itself an admissible selection mechanism 
because it has reciprocal selection parameter s — —s/ (1 + s) > 0. If p(x) is the 
mechanism of example 3.2, namely (j6)), with parameters (s, ft,), then p(x) is itself 
a selection with dominance mechanism with reciprocal parameters s = —s/ (1 + s) 
and h = 1 — h. Assuming (s < 0, h G (1/2, 1)), p (x) is not admissible whereas p{x) 
is because s > and /i e (0,1/2). Similarly, when 7 S (0, 1), the mechanism p (x) = 
1 — (1 — a;) 7 is not admissible but, from example 5.1.1, p(x) := 1 — p (1 — x) = x 7 
is. 



5.2. Bias mechanisms with mutational effects. Let pu (x) = (1 — p 2 ) x + 

p 1 (1 — x) be the mutational bias mechanism (with re = 1 — (/^ + /j 2 ) > 0). Let 
p (x) be a bias mechanism such that q (x) is CM with p(l) — p(0) — 1. Then 

Pm (a;) = Pm (p(z)) 

is such that qM (x) := 1 — pu (x) is CM. It is therefore admissible and adds muta- 
tional effects to the primary mechanism p (x) . For example, 

~ , s /i 1 +x((l + .s)(l-^ 2 )-^ 1 ) 

Pm (X) = — 

1 + .S3: 

is a mechanism of selection combined with mutational effects. We have pu (0) = 
Mi) Pm (1) = 1 — A*2- The mechanisms pu (x) obtained in this way all share the 
specificity: p M (1) - Pm (0) =: re < 1. 

Note that, except for the mutational affine mechanism, it is not true in general that 
whenever p 1 (x) and p 2 (x) are two admissible bias mechanisms, then p 1 (p 2 (x)) is 
admissible. 



5.3. Joint bias effects and Compound bias. Let p 1 (x) and p 2 (x) be two ad- 
missible bias in that q 1 (x) := 1 — p 1 (x) and q 2 (x) := 1 — p 2 (x) are both completely 
monotone. Then 

q ( x ) = q 1 ( x ) q 2 ( x ) is CM. 

Thus, with xi o X2 '■= xi + x 2 — X1X2, the probabilistic product in [0, 1] 

(p 1 {x) , P 2 (x)) -> p (x) = p 1 (x) o p 2 (x) . 

Whenever a WF model is considered with bias p (x) = p 1 (x) o p 2 (x) obtained from 
two distinct bias p 1 (x) and p 2 (x) , we call it a WF model with joint bias effect. 



Example 5.3.1 (Joint selection and mutational effects). Let p 1 (x) = pm (x) and 
p 2 (x) = (1 + s) x/ (1 + sx). We get 

, . (1 — Ui — rex) (1 — x) , , , Ui + x (s + 1 - u-i + re) - rex 2 

q (x) = — and p (x) = — , 

1 + sx 1 + sx 

with p (0) = ^ij, p (1) = 1. This mechanism differs from the traditional mechanism 
of selection combined with mutational effects. 

Example 5.3.2 (Joint mutation and neutral effects). Let p 1 (x) = (1 — ^t 2 )x + 
H l (1 — x) and p 2 (x) = x. We get 

q (x) = (1 — x) (1 — p 1 — rex) and p (x) = p 1 + x (1 — fi 1 + re (1 — x)) , 



14 



THIERRY E. HUILLET 



with p (0) = fi 1 , p(l) = 1. When fi ± — (one-way mutations), we recover the 
quadratic mechanism with curvature parameter c = 1 — fi 2 - This finding justifies 
some interest into the quadratic mechanisms with c =/= 1 . 

With j = 1,2, the reproduction law of each elementary effect is u J n ~ Multin(n; n J n ) , 
where tt^„ = pi (s) - pi m = 1, Then, ~ Multin(n; tt„) , 7r mi „ = 

P ( r?) ~P l 22 ^")' m — 1) where 7r„ := 7r* 7r^ is easily obtained component- 
wise by: 

m m 
i = l /=1 

We let: v n := v\ v\ ~ Multin(n; 7r T \ 7r^). It is the reproduction law of a WF 
model obtained jointly from the two bias p (x) and p 2 (x). 

Let (f> (x) : (0, 1) — * (0, 1) be an absolutely monotone function satisfying: <fr^ (x) > 
for all I— th derivatives (jr- 1 ' of cf>, all x G (0,1). Such functions are well-known 
to be probability generating functions (pgfs) of N— valued random variables, say 
N, that is to say: 4>{x) = E [x w ] . Clearly, if q is CM on (0,1), then so is: 
<?0 (x) := <j> (q (x)) . Thus p^ (x) := 1 — <j) (1 — p (x)) is an admissible bias mech- 
anism in that q^ (x) := 1 — P0 (x) is CM. We call it a compound bias. 

Example 5.3.3 The general mechanism with mutational effects is in this class. In- 
deed, 

Qm (x) = 1 - p M (x) = 1 -p M (p (as)) = 1 - Pm (1 - q (x)) 

and so (x) = 1 — pu (1 — x) = 1 — (1 — /i 2 ) (1 — x) — /i x x = /i 2 + kx which is 
absolutely monotone as soon as k = 1 — (/Xj + /x 2 ) > 0. 

Example 5.34 With > 0, taking <j> (x) = e" 9 ^-*) or (e 9 * - l) / (e e - l), the 
pgf of a Poisson (or shifted-Poisson) random variable, p^ (x) = 1 — 0(1 —p(x)) 
is admissible if p (x) is. Note that if q is of the form q^, where </> is the pgf of 
a Poisson random variable, then q^ (x) Q is admissible for all a > 0, a property 
reminiscent of infinite divisibility for pgfs. Taking <f>(x) = (1 — it) j (1 — 7rx) or 
x (1 — 7r) / (1 — 7rx), 7r e (0, 1), the pgf of a geometric (or shifted-geometric) random 
variable, p^ (x) = sp (x) / (1 + sp (x)) or (s + l)p(x) / (1 + sp (x)) is admissible if 
p (x) is (with s = 7r/ (1 — 7r) > 0). In the external latter mechanism, one recognizes 
the one in ([5]) occurring in the model with selection of example 3.1. 

Example 5.3.5 Let p (x) = x 7 with 7 £ (0, 1) as in example 5.1.1. Then p^ (x) = 1 — 
q$ (x) where q<f, (x) = e - ^ 1-9 ^) = e~ Bx ~' , 9 > 0, is admissible. Note that (x) ~ 

xlO 

Ox 1 . The reciprocal function p^ (x) = q<f, (1 — x) = e -6 ^ 1 ^ 7 also interprets as an 
absolutely monotone discrete-stable pgf (see Steutel, van Harn, T979). It is not 
admissible. 

Proceeding in this way, one can produce a wealth of admissible bias probabilities p^, 
the signification of which in Population Genetics remaining though to be pinpointed, 
in each specific case study. 
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6. Limit laws 



Consider a WF model (N r : r E N) on {0, n} with forward transition matrix: 



n—k' 



with admissible bias p (x) . Define (A r ; r e N) as the dual Markov chain on {0, n} 
with transition probability: 



'.(«)=("]E(-r'j]«|.-j 



Then, (N r ,A r ;r e N) arc dual with respect to (m, k) := $^ ( m ^) = ("~ m )/("), 
to wit: 



m 



We shall distinguish two cases. 



Case 1: Assume first that 

N r — > iVoo as r | oo, independently of Ao = m > 1. 

Let 7TQO (i) = P(Aoo = i) and 7^ = (71-00 (0) , ...,7roo (n)) . The line vector 71^ is 
the left eigenvector of II „ associated to the eigenvalue f : tv'^ — n'^Un. It is the 
(unique) invariant probability measure (stationary distribution) of (N r ;r € N) . 

If this stationary distribution exists, then, using duality formula, necessarily, A r — ► 
as r | 00 with probability {A^ = 0) =: p^ (k) < 1. The numbers p x (k) are 
the extinction probabilities of the dual process started at k. As is well-known, — 
(Poo (0) , p x (n))' is the unique solution to (I - P n ) p oo = with p ra (0) = 1. 



Remark: Typical situations where (N r ; r € N) has an invariant measure is when 
mutational effects are present, and more generally when the bias mechanism sat- 
isfies p(0) > and p(l) < 1. In this situation, the forward stochastic transition 
matrix IT n has an algebraically simple dominant eigenvalue 1. By Perron- Frobenius 
theorem: 

limn; = Itt'^, 

where 1' = (1, 1) . The invariant probability measure can be approximated by 
subsequent iterates of II„, the convergence being exponentially fast, with rate gov- 
erned by the second largest eigenvalue. Of course, detailed balance (stating that 
TTkH-n (k, k') = 7Tfe'II n (k 1 , kj) does not hold here and the forward chain in equilib- 
rium is not time-reversible. 

In these recurrent cases, the dual ancestral process A r started at k gets extinct 
with probability p^ (k). The numbers 1 — p x (k) are the probabilities that it gets 
killed before getting extinct; in other words, 1 — p M (k) are the probabilities that 
A r first hits an extra coffin state, say {d} , before hitting {0}. □ 
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In terms of moments, by the duality formula, we conclude that: 

-l 



E 



n — N a 
k 



= Poo (k) = P fe (Aoo = 0) , 



relating factorial moments of n — to the extinction probabilities of A r given 
Aq = k. We also have 



k=0 



n - Nov 
k 



E 



(1 + v) 



n-N a 



]-± 

fe=0 v 7 



and so, the probability generating function of can be expressed as (u e [0, 1]): 



E i uNaa ] = E (fc) Poo (k) u n - k (1 - u) k , 



in terms of the Bernstein-Bezier polynomial of ( Poo {n — k); k = 0, n) . 

Let p^ — ( Poo (0) , Poo (n))' . The vector p^ is the right eigenvector of P n associ- 
ated to the eigenvalue 1 : p x = Pnp^. In this case, the matrix P n is sub-stochastic 
and the extinction probability of (A r ; r E N) given Aq = k is less than one. Thanks 
to duality, we have: 

n„$„ = § n p' n . 

where the matrix <!>„ is symmetric whereas the matrix cl?" 1 is symmetric right- lower 
triangular, with: 



n — m\ In 
k )'\k 



(m, k) 



n — k\,fn 
m ) \m 



(-1) 



i+j—n I J 



n — i 



Thus, 



7r' TT $ = 7r' $ = 7r' $ P' 



showing that p^ and it^ are related through: 

Poo = ^TToo Or TToo = ^Voc- 

77ie knowledge of the invariant measure of the forward process allows one to 
compute the extinction probabilities p x of the dual backward ancestral process ( and 
conversely). 



Example: Consider the discrete WF model with mutations of example 3.4. In this 

case, N r —* iVoo as r j oo, regardless oi Nq — m and (N r ;r £ N) has an invari- 
ant measure which is difficult to compute. Looking at the backward process, the 
matrix P n is sub-stochastic (if n l > 0) and lower-left triangular. Due to triangu- 
larity, the right eigenvector p^ of P n can easily be computed explicitly in terms of 
(P n (i,j) ; j < i) ,i = 0, n. We therefore get the following alternating expression 
for the invariant measure: 



71" oo 00 



£(-*) 



: 'jjPo.in-j). 
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Concerning moments, for instance, we have p M (1) = \i 2 j (/^ + /U 2 ) so that E [iVoc] — 
n^/ + n 2 )', from ([9]) we also have: 

fo s ^2 [nfH (1 + K ) + k 2 ] 1 ( , ,x (2n - 1) ^ r 2 

£»„ (2) = 7- ; r T^r = 71 (71 — 1 ) — 71 - + E UV~_ 

[(1 - «) (n - (n - 1) « 2 )] ' Mi + M 2 L °° 

allowing to compute E [iV^l and then the variance of N^. We get: 

* 2 (N 00 ) = + (n) ^^n, 

( M i+/i 2 ) 2 (2n(/i 1 + M2 ) + l) ' "Too 2(/i 1 + M2 ) 3 

suggesting (when /i!/i 2 > 0) a Central Limit Theorem for N x as n grows large: 
V™ V Mi+Ma/ntoo V 2(/i 1 +^ 2 ) J / 



Case 2. Conversely, assume now that given Nq = to 



as r | oo , with probability P m (iVoo = 0) =: p m (m) . 



so that boundaries {0,n} are absorbing. Then, the ancestral process {A r \r € N) 
possesses an invariant distribution, in that: 

as r f oo, independently of A$ — k G [n] . 
In terms of moments, the duality formula means that: 



E 



n - A c 



Poo (to) = P m = 0) , 



relating to— factorial moments of n — A^ to the extinction probabilities of N r given 
TVq = to. Stated differently, the probability generating function of Aoo is (u 6 [0, 1]): 



Let 7TOO (i) = W(A 00 = z), with ir'^ — Tr' 00 P n . Then, using duality, p m is the right 
eigenvector of H n associated to the eigenvalue 1 : p M = II^p^. Thus, p M and tv^ 
are related through: 

Poo = ^n^oo Or TToo = ^ Poo- 

The knowledge of the extinction probabilities of the forward process allows one 
to compute the invariant measure ■n^ of the dual backward ancestral process ( and 
conversely). 



Examples: Typical situations where boundaries {0, n} are absorbing to (N r ; r G N) 
occur when p (0) = and p (1) = 1. The simplest case is the neutral case, but the 
non-neutral selection and selection with dominance mechanisms or the quadratic 
mechanism (examples 3.1, 3.2 and 3.3) are also in this class. For instance: 
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(i) In the neutral case, p x (to) = 1-m/n. Thus, tt^ (i) = (") Y?j=o 3 Q) n = 

<5i,i and A — > 1 as r | oo, the degenerate state reached when the most recent 
common ancestor (MRCA) is attained. 

(ii) Non-degenerate solutions of Aoo are obtained when considering bias mechanisms 
with p (0) = and p (1) = 1. 

(Hi) Consider any biased WF model with p (0) = and p (1) = 1 for which p (x) <~ 

£C|0 

Ax, A > 1. Then, due to large sample asymptotic independence: 

where ^ is an iid sequence with ^ <~ Poisson(A) (as it can easily be checked by 
the Poisson limit to the binomial distribution). In this case, the limiting extinction 
probability of (JV r ;r S N) given No = m is lim n |oo Poo (to) = p m , m = 1,2,..., 
where < p < 1 is the smallest solution to the fixed point equation 

x = e- x(1 - x K 

p is the singleton extinction probability of a super-critical Galton- Watson process 
with offspring distribution Poisson(A). More precisely, proceeding as in Mohle 
'1994, Theorem 4.5, we have 

showing that the convergence of p^ (m) to p m is of order n . As a result, we get 
the asymptotic normality: 

■^(^oo-n(l-p)) fAfU^A). 
Vn «T°c \ 1 + \p J 

Intuitively, ±E [n - Ao] = Poo (1) = ¥ 1 (N^ = 0) -» p, showing that E [Ac] ~ 

n|oo 

n (1 — p) and 

ra(n 1 _ 1) E t( n - 4») (n - 1 - Ac)] = Poo (2) = F 2 (TVoo = 0) -> p 2 , 
showing that ct 2 (Ax,) ~ np (1 — p) / (1 + Ap) . 

n|oo 

For the quadratic example 3.3, p(x) = x (1 + c — ex), with c e [0, 1], A = 1 + c > 1 
as soon as c > 0. When c € (0, 1], we thus always have asymptotic normality. 
For the example 3.1 with selection, p(x) = (1 + s) xj (1 + sx) , with s > —1 : 
p (x) ~ (1 + s) x and so A = 1 + s. We have asymptotic normality only when 

s > 0, i.e. when the fitness is advantageous (corresponding as required to complete 
monotonicity of corresponding q = 1 — p) . 

Note that this asymptotic behavior does not hold for the Lipshitz-continuous admis- 
sible mechanism p (x) = x 1 of example 5.1.1 (or more generally for mechanisms sat- 
isfying p (x) ~ 0x~t , 6 > as in the compound bias example 5.3.5) with 7 e (0, 1) 

because its behavior near is not linear. This puzzling class of models seems to 
deserve a special study as deviation to normality is expected. We postpone it to a 
future work. 
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7. Concluding remarks 

In this Note, we focused on discrete non-neutral Wright-Fisher models and on the 
conditions on the bias probabilities under which the forward branching dynamics 
is amenable to a dual discrete ancestral coalescent. It was shown that it concerns a 
large class of non- neutral models involving completely monotone bias probabilities. 
Several examples were supplied, some standard, some less classical. The Wright- 
Fisher model with forward binomial transition matrix is a particular instance of the 
Dirichlet model with Dirichlct-binomial transition matrix. Following the same lines, 
using the representation of the Dirichlet binomial distribution as a beta mixture 
of the binomial distribution, it would be interesting to exhibit the corresponding 
conditions on the bias mechanism, were the starting point to be a forward Dirichlet 
branching process. Also of particular interest in this respect would be the discrete 
non-neutral Moran models whose forward transition matrices are simpler because 
of their tridiagonal Jacobi structure. We hope to be able to consider shortly these 
cases (and maybe others) in a future work. 
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